Rule extraction for multi bottom-up tree transducers

نویسنده

  • Nina Seemann
چکیده

Following the invention of computers, it was always a dream to obtain translations automatically. If we give a machine a sentence it should return a sentence in another language expressing the same meaning. In the subfield of statistical machine translation (SMT), this translation is achieved with the help of statistical models. Those models use large text collections to automatically learn basic translation units that model the translation from a source sentence into a target sentence. The basic translation units can be single words or phrases consisting of multiple words. Other approaches, called syntax-based SMT, use rules of some formal grammar as their basic translation units. Syntax-based SMT systems easily allow the use of linguistic annotations. Rules can contain nonterminal symbols which can encode linguistic annotations. Furthermore, one can decide whether such annotations are used for both the source and target language, for one language only, or if those annotations are excluded alltogether. The integration of linguistic annotations yielded mixed results. In some cases translation quality significantly improves whereas in others it seems to hurt coverage and thus overall translation quality. While the use of annotations for both languages generally did not result in good translation quality, the use for one language only showed improvements. However, the best results are often obtained by a syntax-based SMT system that excludes all linguistic annotations. The underlying formal grammars vastly vary with respect to their expressive power. Synchronous context-free grammars are widely used but more powerful for-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extended Multi Bottom-Up Tree Transducers Composition and Decomposition

Extended multi bottom-up tree transducers are de ned and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers, even linear ones, can compute all transformations that are computed by linear extended top-down tree transducers, which are a th...

متن کامل

Composition and Decomposition of Extended Multi Bottom-Up Tree Transducers?

Extended multi bottom-up tree transducers are de ned and investigated. They are an extension of multi bottom-up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers can compute all transformations that are computed by linear extended top-down tree transducers (which are a theoretical model for...

متن کامل

String-to-Tree Multi Bottom-up Tree Transducers

We achieve significant improvements in several syntax-based machine translation experiments using a string-to-tree variant of multi bottom-up tree transducers. Our new parameterized rule extraction algorithm extracts string-to-tree rules that can be discontiguous and non-minimal in contrast to existing algorithms for the tree-to-tree setting. The obtained models significantly outperform the str...

متن کامل

Exact Decoding with Multi Bottom-Up Tree Transducers

We present an experimental statistical tree-to-tree machine translation system based on the multi-bottom up tree transducer including rule extraction, tuning and decoding. Thanks to input parse forests and a “no pruning” strategy during decoding, the obtained translations are competitive. The drawbacks are a restricted coverage of 70% on test data, in part due to exact input parse tree matching...

متن کامل

A Systematic Evaluation of MBOT in Statistical Machine Translation

Shallow local multi-bottom up tree transducers (MBOTs) have been successfully used as translation models in several settings because of their ability to model discontinuities. In this contribution, several additional settings are explored and evaluated. The first rule extractions for tree-to-tree MBOT with non-minimal rules and for string-to-string MBOT are developed. All existing MBOT systems ...

متن کامل

How to train your multi bottom-up tree transducer

The local multi bottom-up tree transducer is introduced and related to the (non-contiguous) synchronous tree sequence substitution grammar. It is then shown how to obtain a weighted local multi bottom-up tree transducer from a bilingual and biparsed corpus. Finally, the problem of non-preservation of regularity is addressed. Three properties that ensure preservation are introduced, and it is di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016